The role of adenocarcinoma subtypes and immunohistochemistry in predicting lymph node metastasis in early invasive lung adenocarcinoma

Background Identifying lymph node metastasis areas during surgery for early invasive lung adenocarcinoma remains challenging. The aim of this study was to develop a nomogram mathematical model before the end of surgery for predicting lymph node metastasis in patients with early invasive lung adenocarcinoma. Methods In this study, we included patients with invasive lung adenocarcinoma measuring ≤ 2 cm who underwent pulmonary resection with definite pathology at Qilu Hospital of Shandong University from January 2020 to January 2022. Preoperative biomarker results, clinical features, and computed tomography characteristics were collected. The enrolled patients were randomized into a training cohort and a validation cohort in a 7:3 ratio. The training cohort was used to construct the predictive model, while the validation cohort was used to test the model independently. Univariate and multivariate logistic regression analyses were performed to identify independent risk factors. The prediction model and nomogram were established based on the independent risk factors. Recipient operating characteristic (ROC) curves were used to assess the discrimination ability of the model. Calibration capability was assessed using the Hosmer–Lemeshow test and calibration curves. The clinical utility of the nomogram was assessed using decision curve analysis (DCA). Results The overall incidence of lymph node metastasis was 13.23% (61/461). Six indicators were finally determined to be independently associated with lymph node metastasis. These six indicators were: age (P < 0.001), serum amyloid (SA) (P = 0.008); carcinoma antigen 125 (CA125) (P = 0. 042); mucus composition (P = 0.003); novel aspartic proteinase of the pepsin family A (Napsin A) (P = 0.007); and cytokeratin 5/6 (CK5/6) (P = 0.042). The area under the ROC curve (AUC) was 0.843 (95% CI: 0.779–0.908) in the training cohort and 0.838 (95% CI: 0.748–0.927) in the validation cohort. the P-value of the Hosmer–Lemeshow test was 0.0613 in the training cohort and 0.8628 in the validation cohort. the bias of the training cohort corrected C-index was 0.8444 and the bias-corrected C-index for the validation cohort was 0.8375. demonstrating that the prediction model has good discriminative power and good calibration. Conclusions The column line graphs created showed excellent discrimination and calibration to predict lymph node status in patients with ≤ 2 cm invasive lung adenocarcinoma. In addition, the predictive model has predictive potential before the end of surgery and can inform clinical decision making. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-024-11843-4.


Introduction
Lung cancer (LC) is the second most prevalent tumor and remains the leading cause of malignancy-related deaths worldwide by far [1].LC is commonly classified into small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC).Among them, adenocarcinoma is the most important subtype of NSCLC and the most common type of LC.With the increasing popularity of low-dose spiral computed tomography (CT) in health screening and disease diagnosis, the incidence of ≤ 2 cm lung cancer has been increasing [2].For early-stage lung adenocarcinoma, more thoracic surgeons are accepting segmental or subsegmental resection and selective lymph node dissection as the optimal treatment modality [3,4].However, in some LC cases, lymph node metastasis (LNM) occurs in the early stages of the tumor.The incidence of LNM in LC cases with lesions ≤ 2 cm in diameter has been reported to be about 10% [5,6].Emerging evidence suggests that lymph node metastasis is a risk factor for poor prognosis in patients with early-stage lung adenocarcinoma [7].Unfortunately, the accuracy of preoperative lymph node staging CT scans is only 45%-79% [8][9][10][11][12].Preoperative mediastinoscopy and endobronchial ultrasound transbronchial needle aspiration are not routinely used in patients with clinical stage I disease, and these methods have produced a considerable number of false-negative results [13][14][15].Complete clearance of metastatic lymph nodes during surgery plays a key role in improving the disease-free survival and overall survival of patients [16].Therefore, it is necessary to accurately assess preoperative lymph nodes metastasis in NSCLC.
It has been shown that adenocarcinomas with micropapillary and solid growth patterns are more aggressive and have a poorer prognosis [17,18].In addition, blood inflammatory markers and tumor markers can be used to predict lymph node metastasis in lung cancer [19][20][21][22].CT remains the most widely used tool to assess tumor and lymph node involvement in patients with early-stage non-small cell lung cancer [8][9][10][11].Some researchers claim that frozen sections are a key indicator to guide the approach to resection [23] and that it is feasible to report histological subtypes and other pathological features during surgery [24,25].
In our study, we explored the risk factors for lymph node metastasis in a cohort of patients with early invasive lung cancer and developed a nomogram model for predicting the risk of lymph node metastasis based on patient clinical information, hematologic indicators, imaging features, and pathologic findings.The aim was to enable the nomogram to quickly and accurately predict the incidence of lymph node metastasis before or during surgery, which may provide a computational method for surgeons to make intraoperative decisions.

Patients
This study was approved by the Ethics Committee of Qilu Hospital, Shandong University (registration number: KYLL-202008-023-1), and all patients signed an informed consent form for the use of their clinical information prior to the procedure.
Patients with invasive adenocarcinoma from January 2020 to December 2021 at Qilu Hospital of Shandong University were retrospectively evaluated.
The inclusion criteria were: (1) patients with a single intrapulmonary nodule suggested by chest CT within 1 month before surgery; (2) nodules with a maximum diameter ≤ 20 mm on CT; (3) undergoing pneumonectomy (lobectomy or subpneumonectomy) with systemic lymph node dissection; (4) complete pathological data and pathological type of Invasive lung adenocarcinoma; (5) not receiving neoadjuvant chemotherapy or radiotherapy before surgery; (6) no pulmonary atelectasis and active inflammatory images of the lungs.Exclusion criteria were (1) patients < 18 years of age, (2) open-heart surgery, (3) incomplete perioperative data, and (4) patients with a history of malignant disease within 5 years.(5) combination of acute infectious diseases that can cause changes in the levels of systemic inflammatory markers; (6) presence of distant metastases.
A total of 2213 patients were included in this study, and after our exclusion according to the above-mentioned criteria, 522 patients with invasive lung adenocarcinoma with tumor size ≤ 2 cm were finally recruited in our study.Figure 1 shows the flow chart of included patients.

Clinical data of patients
Clinicopathological information was collected from the patient record management system as follows: age, gender, presence of preoperative comorbidities [hypertension, diabetes mellitus, and chronic obstructive pulmonary disease (COPD)], history of smoking, body mass index (BMI), predicted percent forceful expiratory volume in one second (FEV1% predicted), predicted percent maximum voluntary ventilation (MVV% predicted), and American Society of Anesthesiologists (ASA) score.

Imaging analysis
The morphological features of computed tomography include: location (central or peripheral), shape (regular or irregular), spiculation, calcification, cavity sign, bronchial sign, lobar sign, pleural adhesion sign, vascular penetration sign, pleural effusion sign, maximum tumor diameter, lymph node enlargement sign, and consolidation to tumor ratio (CTR).Two radiologists measured each imaging feature independently, and a third radiologist with more than 20 years of experience in chest radiology reassessed the discrepancies.Any disagreements were resolved by consensus.
Centrality was defined as nodules located in the bronchi, lobular bronchi, and segmental bronchi.Peripherality was defined as nodules located below the tertiary bronchi.Spiculation was defined as spread from the nodal margins to the lung parenchyma without contacting the Fig. 1 Flow chart of this study pleural surface.Signs of calcification were defined as having one of these patterns on CT imaging: stratification, central nodule, diffusion, or popcorn pattern.Cavitation signs were defined as gas-filled spaces that are considered to be transparent or low-attenuation regions.The bronchial sign shows direct bronchial involvement of nodules on CT images.Lobulation was defined as the wavy or fan-shaped portion of the lesion surface and the strands extending from the nodal margins into the lung parenchyma.Signs of pleural adhesions were defined as linear attenuation or major or minor fissures toward the pleura.The vascular penetration sign was observed on the CT image with a pulmonary artery crossing the node.The pleural effusion sign was defined as a blunting of the rib-diaphragm angle visible on the CT image.The lymph node enlargement sign was the enlargement of mediastinal lymph nodes that can be observed on CT images.CTR was defined as the ratio of the diameter of the solid component of the lung nodule to the maximum diameter of the nodule.

Histological evaluation
All pathological specimens were fixed in formalin, stained with hematoxylin-eosin, and evaluated by two experienced lung pathologists.Histopathological evaluation was performed by examining hematoxylin-eosinstained slides with a light microscope.All specimens were classified according to the International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society classification of adenocarcinoma of the lung [33].The pathological lymph node status of patients was confirmed according to the 8th edition of the TNM lung cancer classification.
The percentage of each histological component (mucinous, lepidic, acinar, papillary, micropapillary and solid pattern) was recorded in 5% increments and the tumors were classified according to the predominant pattern.The pattern was considered present if ≥ 5% of the histological pattern was present in the tumor.

DNA purification and quantification
Cutting all formalin-fixed paraffin-embedded (FFPE) specimens to 5-8 μm thickness.Thereafter, DNA and RNA extraction was performed using 5-30 tissue sections with at least 2% tumor cells using the FFPE DNA/ RNA Nucleic Acid Extraction Kit (No. 8.0223601X036G, Xiamen Diagnostics, Xiamen, China).After isolation of DNA and RNA, the concentrations of DNA and RNA were determined using a microscopic spectrophotometer.the RNA concentrations ranged from 10 to 500 ng/ μL and the DNA concentrations were > 2 ng/μL.

Immunohistochemistry Validation in Resected Patients
All IHC staining was performed in the clinical immunohistochemistry laboratory of our hospital pathology department.All IHC staining was performed in the clinical immunohistochemistry laboratory of our hospital pathology department.Briefly, specimens were sectioned at 5 μm, dewaxed and incubated with primary antibody.Staining characteristics as well as the intensity and distribution of staining patterns were reviewed and considered.If more than 5% of the tumor cells with the appropriate staining pattern were found, the case was considered positive; otherwise, the case was considered negative.Immunohistochemistry was verified for CK5/6, CK7, Napsin A, MUC-AC, P63, Ki-67% positive rate, CyclinD1, EMA, CD31, D2-40, etc.

special staining in resected patients
The Periodic Acid-Schiff (PAS) reaction, Periodic Acid-Schiff reaction with diastase (PAS-D) and elastic fibers are three special staining procedures that are commonly performed in a histology laboratory.The staining reaction was classified as positive or negative by three "blinded" observers.

Statistical analysis
All statistical analyses were performed using SPSS 26.0 (SPSS Inc., Chicago, Illinois, USA) and R statistical software (Windows version 4.2.1, http: //www.r-project.org/).We used the "rms package" to plot the nomogram, "pROC" to plot the ROC curve, and "rmda" to plot the DCA curve.Categorical variables were compared using Pearson's Chi-square test or Fisher's exact test.Normally distributed continuous variables were expressed as mean ± standard deviation (SD) and compared using the Student's t-test.For non-normally distributed continuous variables, data were expressed as medians (interquartile range [IQR]) and compared between two groups using the Mann-Whitney U test.Statistical significance was described as a two-sided P value of less than 0.05.
We implement the random assignment of patients through the R.All enrolled patients were randomly assigned to the training and validation cohorts in a 7:3 ratio, using a randomly segmented sample.The training cohort was used to develop the prediction nomograms, while the validation cohort was used to verify the performance of the nomograms.

Construction of nomogram
The training cohort data were first analyzed by univariate logistic regression analysis to identify potential risk factors.Those factors with P-values less than 0.05 in univariate

Nomogram performance
An assessment of the performance of predictive nomograms is made by discriminative power, calibration and clinical utility.Discriminative power is the capability of a model to correctly differentiate between events and nonevents.ROC curves are employed to assess the recognition efficiency of predictive nomograms [34].A measurement of how well the predicted probability matches the actual result is called calibration.the Hosmer-Lemeshow test can be used to assess calibration ability, with a p-value greater than 0.05 indicating satisfactory calibration [35].Subsequently a nomogram calibration plot is formed to further assess the calibration.This was verified internally by using a bootstrap method repeated 1000 times [36].Predictive nomograms were evaluated for clinical effectiveness using decision curve analysis (DCA) based on the net benefit of different threshold probabilities [37].The optimal cutoff value was determined when the Youden index (sensitivity + specificity-1) reached its maximum value based on ROC curve analysis of the training cohort.

Patient characteristics
A total of 522 patients were enrolled in this study.2.

Frequency of targeted gene alterations
Of the 522 patients, 46 underwent genetic alteration analysis using ARMS-PCR.Of these, 37 (80.4%)samples

Nomogram construction
All six independent risk factors for lymph node metastasis in small invasive lung adenocarcinoma within 2 cm were included to create a logistic regression model.The probability of lymph node metastasis in small invasive lung adenocarcinoma could be calculated by the following formula: ln (p/1-p) = -0.068× age + 0.02 5 × SA + 0. 098 × C A125 + 0.547 × mucino us (no = 0; yes = 1 ) + 2.927 × CK5/6 ( no = 0; yes = 1)-13.972.Based on the above equation, a no mog r am of the predicted probability of lymph node metastasis in invasive lung adenocarcinoma within 2 cm was plotted using R statistical software (Fig. 3).As shown in this nomogram, there are 9 axes, and axes 2-7 represent the six variables in the prediction model.By drawing a line perpendicular to the highest point axis, the estimated score for each risk factor can be calculated and can be further summed to obtain a total score.The total score axis is then used to predict the probability of developing lymph node metastasis in invasive lung adenocarcinoma, which in turn can further guide the surgical approach.

Predictive performance and validation of the nomogram
Discrimination ability of the prediction model and nomogram is assessed by the ROC curve (Fig. 4).ROC area under the curve (AUC) was 0.843 (95% CI: 0.779-0.908)for the training cohort and 0.838 (95% CI: 0.748-0.927)for the validation cohort, indicating that the nomogram has good predictive accuracy.The ROC curve for the training cohort had a threshold of 0.089 and sensitivities and specificities of 0.795 and 0.786, respectively (      that the difference between the predicted and actual observed probabilities was negligible.A good calibration of the prediction nomogram is also demonstrated by the calibration plots of the training cohort (Fig. 5A) and the validation cohort (Fig. 5B).The bias-corrected C-index for the training cohort was 0.8444 and the bias-corrected C-index for the validation cohort was 0.8375, further demonstrating the goodness of the prediction model.

Clinical utility of the predictive nomogram
Just as shown in Fig. 6A and B, DCA was used to assess the clinical utility of the prediction nomogram.Findings show that the nomogram provided greater net benefit and broader threshold probabilities for predicting the risk of lymph node metastasis in invasive lung adenocarcinoma within 2 cm in both the training and validation cohorts, showing that the nomogram is clinically useful.Figure 7A and B show the clinical impact curves (CIC) for the validation cohort and the verification cohort, respectively.The curves show that a high benefit ratio is obtained within a probability threshold of 0.2-1.0.It suggests that the present model can indeed be used clinically to predict the probability of lymph node metastasis in small invasive lung adenocarcinoma.

Discussion
In  hematologic components, our study showed SA and CA125 as predictors.CTR and tumor size were not shown to be associated with mediastinal lymph node metastasis in our study.The inclusion of immunologic components in the predictors is an innovative point of our study.These previously unpublished observations have potential implications for the therapeutic management of early-stage lung adenocarcinoma.This is because the nomogram may have the potential to predict lymph node status before the end of surgery and to guide surgeons in developing lymph node dissection strategies.
Many studies have been conducted on the effect of age on lymph node metastasis in non-small cell lung cancer [26,[38][39][40][41][42][43][44][45][46].A part of the findings concluded that youth is an influential factor for lymph node metastasis in lung cancer, with a higher risk of lymph node metastasis in lung cancer patients at a younger age [26,[41][42][43].Another part of the study showed that age had no significant effect on lymph node metastasis in lung cancer patients [44][45][46].This discrepancy may be due to differences in the patients included in the study, sample size, and analysis methods.Therefore, the different conclusions reached in previous studies are explainable and acceptable.Based on our findings, we conclude that patients with young invasive lung adenocarcinoma are at greater risk for lymph node metastasis and require more thorough and meticulous lymph node dissection.
To date, there have been some case reports of elevated levels of SA being associated with lung cancer [47][48][49].The predominance of salivary amylase was observed in these studies from the amylase isozyme pattern in serum and tumor tissues.Amylase levels were higher in tumor tissue than in normal lung tissue.Immunohistochemical studies revealed that amylase was located in tumor cells.Observation of ultrastructure revealed electron-dense particles in the cytoplasm of tumor cells.The findings suggest that in this case, amylase is produced by lung cancer.The possibility that serum amylase levels may be a highly sensitive marker for lung cancer was raised in these studies.Our findings found that lung adenocarcinoma patients with high levels of SA concentration in the blood had a higher risk of lymph node metastasis.CA125 has long been recognized for its role as a classical tumor maker, not only as a predictor of lung cancer, but also as a direct correlate of tumor infiltration and metastasis.It has been confirmed that CA125 is associated with lymph node metastasis in lung cancer [50,51].CA125 provides important value in judging the extent of lung cancer metastasis and monitoring the progression of lung cancer disease.This study demonstrated the importance of CA125 in determining whether lymph node metastasis is present in lung cancer patients.Surgeons should be more cautious when performing lymph node dissection during lung cancer surgery when faced with patients with high serum CA125 levels.
Mucus is thought to play a key role in the development of cancer, as mucinous adenocarcinoma in many organs is associated with lymph node metastasis and poorer prognosis [52][53][54][55][56].The mucinous glandular component of the tumor is histologically characterized by cupped and highly columnar epithelial cells and produces mucin, and the mucinous subtype is considered more malignant than other common subtypes of lung adenocarcinoma, such as squamous and alveolar subtypes [57][58][59].Some reports with small sample sizes claim a low rate of lymph node metastasis in invasive mucinous adenocarcinoma [60][61][62][63].The results of other studies hold the opposite opinion.The study by Zhu et al. claimed that the mucus subtype is a risk factor for distant metastasis of lung adenocarcinoma [64].Our findings suggest that the mucus component is one of the risk factors for lymph node metastasis.
Napsin A is a human aspartate protease associated with pepsin, gastrin, renin, and histone protease [65].IHC studies have demonstrated that Napsin A is expressed in normal human type II lung cells and alveolar macrophages [66].Strong cytoplasmic staining for napsin A was observed in up to 87% of lung adenocarcinomas [67][68][69][70][71].In contrast, CK5/6 is a sensitive and relatively specific marker of squamous differentiation [72][73][74].The novelty of our study is that for the first time, lymph node metastasis was linked to these two immunohistochemical markers, demonstrating that CK5/6 and napsin A can be used to predict lymph node metastasis in invasive adenocarcinoma.However, the reasons behind why CK5/6 and napsin A can predict lymph node metastasis are still waiting to be explored and studied.
Our study has several advantages compared with other studies.First, for the first time, we included CK5/6, napsin A, and mucus components as influencing factors for lymph node metastasis in our prediction model.Second, the factors in our prediction model are common and easily available in clinical practice.Third, our prediction model has excellent discriminatory power, calibration, and clinical utility.The model is easy to use in clinical practice, and the associated nomogram guides surgeons to quickly select an optimized surgical approach.
Our study has several limitations.First, the analysis was based on retrospective data from a single institution, and the possibility of selection bias cannot be ruled  out; results from other centers must be validated.Second, mutation testing was performed according to the patients' wishes.Thus, the sample size for testing their genomics is a subset of the entire cohort, which makes it challenging to include mutation information in a multiple regression analysis.Third, the limited number of cases may lead to potential bias, especially in histological subtype analysis.

Conclusion
In this study, a clinical prediction model for six risk factors was proposed.For invasive lung cancer, age, SA, CA125, mucin composition, CK5/6, and napsin-A are important risk factors associated with lymph node metastasis.Based on this line chart, surgeons may be able to predict lymph node status before the end of surgery.

Fig. 3
Fig.3Nomogram for predicting the probability of LNM in small invasive lung adenocarcinoma.SA, serum amyloid; CA125, carcinoma antigen 125; CK 5/6, Cytokeratin 5/6.As shown in this nomogram, there are 9 axes, and axes 2-7 represent the six variables in the prediction model.By drawing a line perpendicular to the highest point axis, the estimated score for each risk factor can be calculated and can be further summed to obtain a total score.The total score axis is then used to predict the probability of developing lymph node metastasis in invasive lung adenocarcinoma, which in turn can further guide the surgical approach

Fig. 4
Fig. 4 Results of ROC curve in the training and validation cohorts

Fig. 5 A
Fig. 5 A, B Calibration curves of the prediction nomogram in the training cohort (A) and validation cohort (B).The X-axis represents the probability predicted by the nomogram and the Y-axis represents the actual probability of LNM in invasive lung adenocarcinoma within 2 cm.The black dashed line represents the ideal curve, the blue solid line represents the apparent curve (uncorrected), and the red solid line represents the deviation curve corrected by bootstrap method (B = 1000 times).LNM, lymph node metastasis

Fig. 6 A
Fig. 6 A, B Decision curve analysis of predicted nomogram in the training cohort (A) and validation cohort (B).The y-axis measures the net benefit, the black line represents the hypothesis that no lymph node metastasis has occurred in invasive lung adenocarcinoma within 2 cm, and the gray line represents the hypothesis that lymph node metastasis has occurred in invasive lung adenocarcinoma measuring ≤ 2 cm.The blue line in Fig. 6A represents the training cohort, and the red line in Fig. 6B represents the validation cohort

Table 1
Patients' characteristics of the training cohort and validation

Table 2
Clinical characteristics of patients in the training and validation cohorts

Table 4
).Our Hosmer-Lemeshow test and calibration charts were used to assess calibration capability.Our p-value for the Hosmer-Lemeshow test was 0.0613 in the training cohort and 0.8628 in the validation cohort, indicating

Table 3
Univariate and multivariate logistic regression analysis of LNM factors in a training cohort

Table 4
Results of ROC curve for training cohort TP true positive, FP false positive, TN true negative, FN false negative, TPR true positive rate, FPR false positive rate, TNR true negative rate, FNR false negative rate, PPV positive predict value, NPR negative predict value, FDR false discovery rate